Efficient counting of k-mers in DNA sequences using a bloom filter

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The veracious counting bloom filter

Counting Bloom Filters (CBFs) are widely employed in many applications for fast membership queries. CBF works on dynamic sets rather than a static set via item insertions and deletions. CBF allows false positive, but not false negative. The Bh-Counting Bloom Filter (Bh-CBF) and Variable Increment Counting Bloom Filter (VI-CBF) are introduced to reduce the false positive probability, but they su...

متن کامل

Classification of DNA sequences using Bloom filters

MOTIVATION New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. RESULTS A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can a...

متن کامل

Improving counting Bloom filter performance with fingerprints

a r t i c l e i n f o a b s t r a c t Bloom filters (BFs) are used in many applications for approximate check of set membership. Counting Bloom filters (CBFs) are an extension of BFs that enable the deletion of entries at the cost of additional storage requirements. Several alternatives to CBFs can be used to reduce the storage overhead. For example schemes based on d-left hashing or Cuckoo has...

متن کامل

These Are Not the K-mers You Are Looking For: Efficient Online K-mer Counting Using a Probabilistic Data Structure

K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix array...

متن کامل

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers

MOTIVATION Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: BMC Bioinformatics

سال: 2011

ISSN: 1471-2105

DOI: 10.1186/1471-2105-12-333